In conventional document processing, ink-on-paper documents are scanned page by page as respective visual images in preparation. A resulting document file of scanned papers is typically a series of visual image of pages. Each visual image of a page does not have accessible content, and existing document processing applications may digitize certain visual image patterns into digitized data, which may be accessible and operational by use of corresponding computer program application. Such data digitizing process of visual images are often referred to as extraction, or data extraction. In light of the amount of information represented in legacy paper forms and scanned documents images, extraction of such document images may greatly affect general productivity in many areas of industry as well as society.